Search CORE

98 research outputs found

Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting

Author: C Alfarano
De-Shuang Huang
DL Olson
G Dupret
H Yu
J Kim
Jisu Kim
Kyungsook Han
N Deshpande
N Littlestone
NE Davey
R Jansen
RE Schapire
SM Gomez
V Neduva
V Neduva
WR Taylor
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Supervised learning and many stochastic methods for predicting protein-protein interactions require both negative and positive interactions in the training data set. Unlike positive interactions, negative interactions cannot be readily obtained from interaction data, so these must be generated. In protein-protein interactions and other molecular interactions as well, taking all non-positive interactions as negative interactions produces too many negative interactions for the positive interactions. Random selection from non-positive interactions is unsuitable, since the selected data may not reflect the original distribution of data. Results We developed a bootstrapping algorithm for generating a negative data set of arbitrary size from protein-protein interaction data. We also developed an efficient boosting algorithm for finding interacting motif pairs in human and virus proteins. The boosting algorithm showed the best performance (84.4% sensitivity and 75.9% specificity) with balanced positive and negative data sets. The boosting algorithm was also used to find potential motif pairs in complexes of human and virus proteins, for which structural data was not used to train the algorithm. Interacting motif pairs common to multiple folds of structural data for the complexes were proven to be statistically significant. The data set for interactions between human and virus proteins was extracted from BOND and is available at <url>http://virus.hpid.org/interactions.aspx</url>. The complexes of human and virus proteins were extracted from PDB and their identifiers are available at <url>http://virus.hpid.org/PDB_IDs.html</url>. Conclusion When the positive and negative training data sets are unbalanced, the result via the prediction model tends to be biased. Bootstrapping is effective for generating a negative data set, for which the size and distribution are easily controlled. Our boosting algorithm could efficiently predict interacting motif pairs from protein interaction and sequence data, which was trained with the balanced data sets generated via the bootstrapping method.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Prediction of HIV-1 virus-host protein interactions using virus and host sequence motifs

Author: A Mehle
A Sodhi
AL Brass
Aydin Tozeren
BS Ramakrishna
E de Castro
EC Holmes
F Diella
FP Davis
G Dennis Jr
GZ Panos
H Dinkel
H Shelton
H Zhou
J Hemelaar
JF Roeth
JN Brown
JN Tournier
JR Morgan
K Kadaveru
Lyle Ungar
MD Dyer
N Evrard-Todeschi
N Hulo
O Tastan
P Patel
P Puntervoll
Perry Evans
R Byland
R König
R Tonikian
R Truant
RG Ptak
RJ Edwards
S Peri
SH Coleman
SH Tan
T Kuntzen
V Neduva
V Neduva
V Neduva
V Neduva
VR Panz
W Dampier
W Fu
W Lv
William Dampier
Publication venue: BioMed Central
Publication date: 01/05/2009
Field of study

Abstract Background Host protein-protein interaction networks are altered by invading virus proteins, which create new interactions, and modify or destroy others. The resulting network topology favors excessive amounts of virus production in a stressed host cell network. Short linear peptide motifs common to both virus and host provide the basis for host network modification. Methods We focused our host-pathogen study on the binding and competing interactions of HIV-1 and human proteins. We showed that peptide motifs conserved across 70% of HIV-1 subtype B and C samples occurred in similar positions on HIV-1 proteins, and we documented protein domains that interact with these conserved motifs. We predicted which human proteins may be targeted by HIV-1 by taking pairs of human proteins that may interact via a motif conserved in HIV-1 and the corresponding interacting protein domain. Results Our predictions were enriched with host proteins known to interact with HIV-1 proteins ENV, NEF, and TAT (p-value < 4.26E-21). Cellular pathways statistically enriched for our predictions include the T cell receptor signaling, natural killer cell mediated cytotoxicity, cell cycle, and apoptosis pathways. Gene Ontology molecular function level 5 categories enriched with both predicted and confirmed HIV-1 targeted proteins included categories associated with phosphorylation events and adenyl ribonucleotide binding. Conclusion A list of host proteins highly enriched with those targeted by HIV-1 proteins can be obtained by searching for host protein motifs along virus protein sequences. The resulting set of host proteins predicted to be targeted by virus proteins will become more accurate with better annotations of motifs and domains. Nevertheless, our study validates the role of linear binding motifs shared by virus and host proteins as an important part of the crosstalk between virus and host.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarlyCommons@Penn

Large-Scale Discovery and Characterization of Protein Regulatory Motifs in Eukaryotes

Author: A Aitken
A Belle
A Remenyi
AM Benham
B Martoglio
C Stark
CE Lawrence
CL Denis
D Chelsky
D Kalderon
D Schwartz
Daniel S. Lieber
E Birney
EC Hurt
F Diella
F Diella
FN Vogtle
G Blobel
H Dinkel
H Goodarzi
H Yu
I Jonassen
I Rigoutsos
J Ptacek
J Rush
JC Semenza
M Fuxreiter
M Gstaiger
MA Beer
MN Hall
N Slonim
NE Davey
O Elemento
Olivier Elemento
P Puntervoll
P Young
RB Russell
RJ Edwards
RJ Edwards
S Balla
S Subramani
Saeed Tavazoie
SB Ficarro
Sridhar Hannenhalli
TL Bailey
TM Cover
V Neduva
V Neduva
V Neduva
V Neduva
VD Rao
WK Huh
X Xie
Y Gavel
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The increasing ability to generate large-scale, quantitative proteomic data has brought with it the challenge of analyzing such data to discover the sequence elements that underlie systems-level protein behavior. Here we show that short, linear protein motifs can be efficiently recovered from proteome-scale datasets such as sub-cellular localization, molecular function, half-life, and protein abundance data using an information theoretic approach. Using this approach, we have identified many known protein motifs, such as phosphorylation sites and localization signals, and discovered a large number of candidate elements. We estimate that ∼80% of these are novel predictions in that they do not match a known motif in both sequence and biological context, suggesting that post-translational regulation of protein behavior is still largely unexplored. These predicted motifs, many of which display preferential association with specific biological pathways and non-random positioning in the linear protein sequence, provide focused hypotheses for experimental validation

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Novel Peptide-Mediated Interactions Derived from High-Resolution 3-Dimensional Structures

Many biological responses to intra- and extracellular stimuli are regulated through complex networks of transient protein interactions where a globular domain in one protein recognizes a linear peptide from another, creating a relatively small contact interface. These peptide stretches are often found in unstructured regions of proteins, and contain a consensus motif complementary to the interaction surface displayed by their binding partners. While most current methods for the de novo discovery of such motifs exploit their tendency to occur in disordered regions, our work here focuses on another observation: upon binding to their partner domain, motifs adopt a well-defined structure. Indeed, through the analysis of all peptide-mediated interactions of known high-resolution three-dimensional (3D) structure, we found that the structure of the peptide may be as characteristic as the consensus motif, and help identify target peptides even though they do not match the established patterns. Our analyses of the structural features of known motifs reveal that they tend to have a particular stretched and elongated structure, unlike most other peptides of the same length. Accordingly, we have implemented a strategy based on a Support Vector Machine that uses this features, along with other structure-encoded information about binding interfaces, to search the set of protein interactions of known 3D structure and to identify unnoticed peptide-mediated interactions among them. We have also derived consensus patterns for these interactions, whenever enough information was available, and compared our results with established linear motif patterns and their binding domains. Finally, to cross-validate our identification strategy, we scanned interactome networks from four model organisms with our newly derived patterns to see if any of them occurred more often than expected. Indeed, we found significant over-representations for 64 domain-motif interactions, 46 of which had not been described before, involving over 6,000 interactions in total for which we could suggest the molecular details determining the binding

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Short Co-occurring Polypeptide Regions Can Predict Global Protein Interaction Maps

Author: A Ben-Hur
C Stark
CY Yu
D Betel
E Andres Leon
E Jain
EA Winzeler
H Jeong
H Yu
J Shen
M Jessulat
M Jessulat
N Zaki
R Nussinov
S Martin
S Pitre
S Pitre
T Yoko-o
TSK Prasad
V Neduva
Y Guo
Y Guo
Y Park
Publication venue: Nature Publishing Group
Publication date: 19/04/2012
Field of study

A goal of the post-genomics era has been to elucidate a detailed global map of protein-protein interactions (PPIs) within a cell. Here, we show that the presence of co-occurring short polypeptide sequences between interacting protein partners appears to be conserved across different organisms. We present an algorithm to automatically generate PPI prediction method parameters for various organisms and illustrate that global PPIs can be predicted from previously reported PPIs within the same or a different organism using protein primary sequences. The PPI prediction code is further accelerated through the use of parallel multi-core programming, which improves its usability for large scale or proteome-wide PPI prediction. We predict and analyze hundreds of novel human PPIs, experimentally confirm protein functions and importantly predict the first genome-wide PPI maps for S. pombe (∼9,000 PPIs) and C. elegans (∼37,500 PPIs)

Crossref

Carleton University's Institutional Repository

PubMed Central

Predicting Protein Kinase Specificity: Predikin Update and Performance in the DREAM4 Challenge

Author: B Kobe
Boštjan Kobe
F Diella
F Diella
G Manning
G Manning
G Stolovitzky
G Zhu
GE Crooks
I Letunic
J Mok
J Schultz
Jonathan J. Ellis
K Nishikawa
LA Pinna
MEM Noble
NFW Saunders
NFW Saunders
P Puntervoll
RI Brinkworth
Richard James Morris
S Knuutila
SK Hanks
T Hunter
TD Schneider
V Neduva
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Predikin is a system for making predictions about protein kinase specificity. It was declared the “best performer” in the protein kinase section of the Peptide Recognition Domain specificity prediction category of the recent DREAM4 challenge (an independent test using unpublished data). In this article we discuss some recent improvements to the Predikin web server — including a more streamlined approach to substrate-to-kinase predictions and whole-proteome predictions — and give an analysis of Predikin's performance in the DREAM4 challenge. We also evaluate these improvements using a data set of yeast kinases that have been experimentally characterised, and we discuss the usefulness of Frobenius distance in assessing the predictive power of position weight matrices

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Queensland University of Technology ePrints Archive

Polycation-π Interactions Are a Driving Force for Molecular Recognition by an Intrinsically Disordered Oncoprotein Family

Author: A Arvand
A Bertolotti
A Bhattacherjee
A Boro
A Zarrine-Afsar
AB Sigalov
AG Turjanski
AH Mao
AH Mao
AK Dunker
AS Mahadevi
AS Reddy
AS Reddy
AY Tan
BA Shoemaker
C Haynes
D Alex
D Vijay
ED Ross
F Jin
G Gill
GM Lee
Guanghong Wei
HJ Dyson
HS Ashbaugh
HS Chan
HS Chan
Hue Sun Chan
HV Erkizan
HV Erkizan
I Staneva
J Danielsson
J Kim
J Singh
J Wang
J Wang
JC Hansen
JC Ma
JF Rual
Jianhui Song
JMR Baker
JP Gallivan
JP Gallivan
JS Barber-Rotenberg
JS Rao
JW Caldwell
Kevin A. W. Lee
KP Ng
KP Ng
KP Ng
L Feng
LM Iakoucheva
LM Salonen
M Azuma
M Borg
M Fuxreiter
M Levitt
MA Pufall
MS Cortese
MS Marshall
P Nash
P Tompa
P Tompa
PB Crowley
Peter Tompa
PJ Grohar
Q Lu
R Bachmaier
R Janknecht
R Petermann
R Wu
S Abeln
Sheung Chun Ng
SK Burley
SM Butterfield
T Mittag
T Mittag
V Neduva
VJ Hilser
VN Uversky
W Wang
W Zhong
X Chu
X Chu
X Xiu
Y Huang
Y Huang
Y Huang
Z Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Molecular recognition by intrinsically disordered proteins (IDPs) commonly involves specific localized contacts and target-induced disorder to order transitions. However, some IDPs remain disordered in the bound state, a phenomenon coined "fuzziness", often characterized by IDP polyvalency, sequence-insensitivity and a dynamic ensemble of disordered bound-state conformations. Besides the above general features, specific biophysical models for fuzzy interactions are mostly lacking. The transcriptional activation domain of the Ewing's Sarcoma oncoprotein family (EAD) is an IDP that exhibits many features of fuzziness, with multiple EAD aromatic side chains driving molecular recognition. Considering the prevalent role of cation-π interactions at various protein-protein interfaces, we hypothesized that EAD-target binding involves polycation- π contacts between a disordered EAD and basic residues on the target. Herein we evaluated the polycation-π hypothesis via functional and theoretical interrogation of EAD variants. The experimental effects of a range of EAD sequence variations, including aromatic number, aromatic density and charge perturbations, all support the cation-π model. Moreover, the activity trends observed are well captured by a coarse-grained EAD chain model and a corresponding analytical model based on interaction between EAD aromatics and surface cations of a generic globular target. EAD-target binding, in the context of pathological Ewing's Sarcoma oncoproteins, is thus seen to be driven by a balance between EAD conformational entropy and favorable EAD-target cation-π contacts. Such a highly versatile mode of molecular recognition offers a general conceptual framework for promiscuous target recognition by polyvalent IDPs. © 2013 Song et al

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Repository of the Academy's Library

FigShare

Rice_Phospho 1.0: a new rice-specific SVM predictor for protein phosphorylation sites

Author: A Palmeri
AH Gandomi
B Petersen
BR Chitteti
CR Ingrell
GK Agrawal
H He
H Nakagami
HD Huang
J Gao
J Gao
JC Obenauer
JH Kim
JL Heazlewood
K Chen
KC Chou
L Breiman
LM Iakoucheva
M Hall
M Sikic
MM Aziz
N Blom
N Blom
P Han
R Kumar
S Que
SW Chang
V Neduva
X Chen
XW Chen
XW Zhao
Y Ban
Y Ke
Y Xue
Y Xue
YZ Chen
Z Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/07/2015
Field of study

Experimentally-determined or computationally-predicted protein phosphorylation sites for distinctive species are becoming increasingly common. In this paper, we compare the predictive performance of a novel classification algorithm with different encoding schemes to develop a rice-specific protein phosphorylation site predictor. Our results imply that the combination of Amino acid occurrence Frequency with Composition of K-Spaced Amino Acid Pairs (AF-CKSAAP) provides the best description of relevant sequence features that surround a phosphorylation site. A support vector machine (SVM) using AF-CKSAAP achieves the best performance in classifying rice protein phophorylation sites when compared to the other algorithms. We have used SVM with AF-CKSAAP to construct a rice-specific protein phosphorylation sites predictor, Rice-Phospho 1.0 (http://bioinformatics.fafu.edu.cn/rice-phospho1.0). We measure the Accuracy (ACC) and Matthews Correlation Coefficient (MCC) of Rice-Phospho 1.0 to be 82.0% and 0.64, significantly higher than those measures for other predictors such as Scansite, Musite, PlantPhos and PhosphoRice. Rice-Phospho 1.0 also successfully predicted the experimentally identified phosphorylation sites in LOC-Os03g51600.1, a protein sequence which did not appear in the training dataset. In summary, Rice-phospho 1.0 outputs reliable predictions of protein phosphorylation sites in rice, and will serve as a useful tool to the community

University of Essex Research Repository

Crossref

PubMed Central

HIV Protein Sequence Hotspots for Crosstalk with Host Hub Proteins

Author: A Greenway
A Henschel
AC Vendel
AK Dunker
Aydin Tozeren
B Ackerson
CL Ruegg
CM Gould
D Ekman
Denis Dupuy
DR Borger
EE Hill
EF Pettersen
F Cardarelli
F Diella
F Meggio
F Meggio
G Baier-Bitterlich
H Jian
H Li
H Wang
HM Berman
HM Craig
J Friborg
JE Dickerson
K Harada
K Kadaveru
K Saksela
KV Prasad
L Deng
M Hiipakka
M Matsubara
M Schindler
MA Dimattia
MA Perez
Mahdi Sarmady
MD Dyer
MR Schaefer
N Arhel
NE Davey
NE Davey
O Haffar
O Tastan
O Tastan
P Abada
P Bayer
P Beauparlant
P Evans
P Evans
RJ Edwards
S Balakrishnan
S Betzi
S Grzesiek
S Sei
SH Tan
SK Srinivas
SS Chen
T Ammosova
T Kino
TH Tahirov
TS Keshava Prasad
V Neduva
V Neduva
W Fu
W Radding
William Dampier
X Yang
X Yang
Y He
Y Liu
Z Nie
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

HIV proteins target host hub proteins for transient binding interactions. The presence of viral proteins in the infected cell results in out-competition of host proteins in their interaction with hub proteins, drastically affecting cell physiology. Functional genomics and interactome datasets can be used to quantify the sequence hotspots on the HIV proteome mediating interactions with host hub proteins. In this study, we used the HIV and human interactome databases to identify HIV targeted host hub proteins and their host binding partners (H2). We developed a high throughput computational procedure utilizing motif discovery algorithms on sets of protein sequences, including sequences of HIV and H2 proteins. We identified as HIV sequence hotspots those linear motifs that are highly conserved on HIV sequences and at the same time have a statistically enriched presence on the sequences of H2 proteins. The HIV protein motifs discovered in this study are expressed by subsets of H2 host proteins potentially outcompeted by HIV proteins. A large subset of these motifs is involved in cleavage, nuclear localization, phosphorylation, and transcription factor binding events. Many such motifs are clustered on an HIV sequence in the form of hotspots. The sequential positions of these hotspots are consistent with the curated literature on phenotype altering residue mutations, as well as with existing binding site data. The hotspot map produced in this study is the first global portrayal of HIV motifs involved in altering the host protein network at highly connected hub nodes

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Lentiviral Hematopoietic Stem Cell Gene Therapy in Patients with Wiskott-Aldrich Syndrome.

Author: Aiuti A
Assanelli A
Banerjee PP
Baricordi C
Benati C
Biasco L
Biffi A
Bosticardo M
Calabria A
Callegaro L
Casiraghi M
Castiello MC
Cicalese MP
Ciceri F
Di Nunzio S
Di Serio C
Dionisio F
Dow DJ
Evangelio C
Ferrua F
Finocchi A
Galimberti S
Galy A
Gardner J
Giannelli S
Mehta N
Metin A
Miniero R
Montini E
Naldini L
Neduva V
Orange JS
Pellin D
Rizzardi P
Roncarolo MG
Scaramuzza S
Schmidt M
Valsecchi MG
Villa A
Von Kalle C
Publication venue: American Association for the Advancement of Science
Publication date: 01/01/2013
Field of study

iskott-Aldrich syndrome (WAS) is an inherited immunodeficiency caused by mutations in the gene encoding WASP, a protein regulating the cytoskeleton. Hematopoietic stem/progenitor cell (HSPC) transplants can be curative, but, when matched donors are unavailable, infusion of autologous HSPCs modified ex vivo by gene therapy is an alternative approach. We used a lentiviral vector encoding functional WASP to genetically correct HSPCs from three WAS patients and reinfused the cells after a reduced-intensity conditioning regimen. All three patients showed stable engraftment of WASP-expressing cells and improvements in platelet counts, immune functions, and clinical scores. Vector integration analyses revealed highly polyclonal and multilineage haematopoiesis resulting from the gene-corrected HSPCs. Lentiviral gene therapy did not induce selection of integrations near oncogenes, and no aberrant clonal expansion was observed after 20 to 32 months. Although extended clinical observation is required to establish long-term safety, lentiviral gene therapy represents a promising treatment for WAS

HAL Evry

PubMed Central

HAL Descartes

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

ART

Hal-Diderot

Archivio istituzionale della ricerca - Università di Padova